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ABSTRACT 

-ko: 

»e  investigates^he  biases  of  the  residuals  and  the  maximum  likelihood 
parameter  estimates  from  standard,  normal-theory  nonlinear  regression 
models.  Bnphasis  is  placed  on  determining  the  influence  of  individual  cases 
on  the  biases  and  on  understanding  how  the  residual  biases  can  affect  the 
usefulness  of  standard  diagnostic  methods.  It  is  shown  that  the  various  bias 
expressions  in  the  literature  are  equivalent,  that  the  biases  in  nonlinear 
regression  can  be  studied  usefully  in  the  context  of  linear  regression,  and 
that  diagnostic  plots  using  residuals  can  be  misleading  because  of  substantial 
residual  biases.  For  a  class  of  partially  nonlinear  models,  it  is  shown  that 
the  maximum  intrinsic  curvature  (Bates  and  Watts  1980)  is  closely  related  to 
the  residual  expectations.  Finally,  the  model  associated  with  power 
transformations  of  single  explanatory  variables  in  linear  regression  is 
investigated  in  further  detail  and  several  numerical  illustrations  are 
presented.  - - - - 
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SIGNIFICANCE  AND  EXPLANATION 

The  statistical  analysis  of  a  collection  of  data  is  usually  based  on  a 
specified  model,  a  mathematical  formula  describing  the  behavior  of  the  data  up 
to  a  few  unknown  parameters  which  are  to  be  estimated  from  the  data.  Much  is 
known  about  the  statistical  behavior  of  parameter  estimates  and  other 
important  statistics  that  arise  from  analyses  based  on  models  that  are  linear 
functions  of  the  parameters,  while  relatively  little  is  known  when  the 
underlying  model  is  nonlinear  in  the  parameters. 

The  purpose  of  this  paper  is  to  study  the  bias  of  the  parameter  estimates 
and  other  important  statistics  that  stem  from  analyses  based  on  nonlinear 
models  with  normal  errors. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


BIAS  IN  NONLINEAR  REGRESSION 
R.O.  Cook,  C.L.  Tsai*  and  B.C.  Wei* 
1.  INTRODUCTION 


It  is  well  known  that  the  maximum  likelihood  estimators  of  the  parameters  in  standard 
nonlinear  regression  models  are  generally  biased  estimators  of  the  true  parameter 
values.  Other  quantities  that  are  useful  in  analyses  of  nonlinear  regression  models  can 
be  characterized  in  the  same  way.  The  ordinary  residuals,  for  example,  generally  have 
nonzero  expectations  and  in  this  sense  are  also  biased. 

The  past  studies  of  the  various  biases  in  nonlinear  regression  are  certainly  useful, 
but  they  do  not  fully  exploit  recognizable  aspects  of  the  structure  of  the  problem  in  a 
way  that  might  facilitate  understanding;  nor  do  they  emphasize  points  of  view  that  provide 
for  an  appreciation  of  the  potential  importance  of  the  biases  in  practice.  The  effects  of 
residual  biases  on  standard  diagnostic  plots,  for  example,  are  apparently  unknown.  In 
this  paper  we  describe  a  relatively  simple  structure  for  investigating  the  nature  of  the 
parameter  and  residual  biases  in  nonlinear  regression. 

In  Section  2,  we  briefly  review  the  past  results  on  the  parameter  bias,  discuss  their 
interpretation  and  use  in  practice,  and  show  that  individual  cases  can  have  a  substantial 
influence  on  the  bias.  In  Section  3  we  investigate  the  expectation  of  the  residual  vector 
and  related  quantities  such  as  the  change  in  the  residual  expectations  that  result  from 
case  deletion.  In  particular,  we  show  that  bias  in  nonlinear  regression  can  be  usefully 
studied  in  the  context  of  linear  regression. 

The  transition  from  linear  to  fully  nonlinear  models  often  seems  quite  abrupt.  This 

transition  can  be  smoothed  by  investigating  intermediate,  partially  nonlinear  models.  In 

Section  4,  we  study  a  special  class  of  partially  nonlinear  models  that  occurs  frequently 

in  practice  and  covers  many  of  the  illustrative  examples  in  the  statistical  literature. 

For  this  class  of  models,  interesting  relationships  occur  between  the  residuals  and  other 

diagnostic  statistics.  The  expectations  of  the  residuals,  for  example,  are  shown  to  be 

closely  related  to  the  Rates-Watts  measure  of  intrinsic  curvature.  In  the  remainder  of 

this  section,  we  establish  notation  and  briefly  review  relevant  background  material. 
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The  standard  nonlinear  regression  model  can  be  represented  as 


yL  •  !>  +  V  i  -  1,2, ...,n  (1) 

where  represents  a  vector  of  known  explanatory  variables  associated  with  the  i-th 

observable  response  y^,  (3  is  a  p-vector  of  unknown  parameters,  the  response  function 
f  is  assumed  to  be  known,  continuous  and  twice  differentiable  in  £,  and  the  errors  are 
assumed  to  be  independent,  identically  distributed  normal  random  variables  with  mean  0 
and  variance  0  .  For  this  model,  the  maximum  likelihood  estimator  0  of  £  can  be 
fo~.d  by  minimizing  the  objective  function 


J<£>  *  l  (y,  ~  £<*,»  0.))2  • 

i-1  1 


(2) 


Kennedy  and  Gentle  (1980)  discuss  computational  methods  for  obtaining  £/  for  our 

at  * 

purposes  we  assume  that  £  is  available.  The  asymptotic  behavior  of  £  is  Investigated 
by  Wu  (1981)  who  provides  additional  references.  The  usual  estimator  of 
a2  is  s2  -  J (£)/(n  -  p)  . 

For  notational  convenience,  let  *  ftj^,  £)  *  1  “  1,2, ...,n  ,  and  let  v  denote 

the  n  x  p  matrix  with  elements  f*  “  Sf^/30^  ,  i  *  1,2, ...,n,  r  ”  1,2,. ..,p  .  Unless 

indicated  otherwise,  all  derivatives  are  evaluated  at  the  true  parameter  values.  Various 

quadratic  expansions  used  in  the  following  sections  involve  the  p  *  p  matrices 

ITS  2 

jj^,  i  =  1,2,.  ..,n>  the  elements  of  are  f^  “3  r«B»  “  1,2, ...,p. 


2.  BIAS  OF  £  . 

Numerous  approximations  for  the  bias  vector  E£  -  £  are  available  in  the 
literature.  Cox  and  Snell  (1968)  derive  an  order  n-1  approximation  for  the  bias  in  an 
expanded  class  of  models  that  includes  (1)  as  a  special  case.  Their  derivation  is  based 
on  a  quadratic  expansion  of  the  likelihood  equations  and  thus  requires  third  partial 
derivatives  of  the  log  likelihood.  Box  (1971)  used  a  quadratic  expansion  of  the  residuals 
to  obtain  an  approximation  for  the  bias  in  a  multivariate  version  of  (1).  For  the  single 


response  case.  Bates  and  Watts  (1980)  provide  a  connection  between  Box's  bias 
approximation  and  their  parameter-effects  curvature  array.  More  recently,  further 
approximations  for  the  bias  have  been  offered  by  Clarice  (1980),  Hougaard  (1981)  and  Araari 
(1982).  Clarke  and  Hougaard  deal  specifically  with  model  (1)  while  Amari  considers  curved 
exponential  families  that  Include  (1)  as  a  special  case.  Clarke's  derivation  of  the  bias 
approximation  is  similar  in  spirit  to  that  of  Cox  and  Snell  (1968),  while  Hougaard’s 
result  is  apparently  a  straightforward  application  of  Skovgaard  (1981). 

The  derivations  of  the  various  bias  approximations  are  based  on  different  notations, 
approaches  and  degrees  of  generality.  Clarke  (1980)  mentions  that  his  result  agrees  with 
that  of  Box  (1971),  but  otherwise  the  relationships  between  these  approximations  are 
apparently  unknown.  As  shown  in  the  Appendix,  these  bias  approximations  are,  in  fact, 
identical  for  model  (1).  Further,  it  is  shown  that,  apart  from  differences  in  notation, 
the  Bates-Watts  (1980)  form  for  the  bias  is  the  same  as  that  given  by  Clarke  (1980). 

A  useful  numerical  study  of  the  bias  in  various  yield-density  models  is  given  by 
Gillis  and  Ratkowsky  <1978;  see  also  Ratkowsky,  1983)  who  conclude  that  the  actual  bias 
(as  obtained  through  simulation)  is  accurately  reflected  by  the  now  common  bias 
approximat ion . 

Since  the  bias  forms  discussed  above  are  all  equivalent,  any  of  them  can  be  used  as 
an  aid  in  nonlinear  regression  analyses.  For  the  purposes  of  interpretation,  however,  it 
is  helpful  to  exploit  the  relationship  with  linear  regression:  From  the  Appendix  it 
follows  that  the  bias  approximation  b  *  E0  -  9_  can  be  expressed  as 

b  -  (VTV)-1VTd  (3) 

2  T  -1 

where  is  an  n-vector  with  elements  ~  o  tr[(V  V)  1  ,  i  “  1,2,. ..,n.  Hius,  the 

bias  b  is  simply  the  coefficients  from  the  ordinary  least  squares  regression  of  jj  on 
the  columns  of  X  .  Further,  jJ  is  essentially  the  expected  difference  between  linear 
and  quadratic  approximations  of  the  estimated  response  function.  To  see  this,  let  £,(§,) 

A 

denote  the  n-vector  with  elements  fi#  i  »  1,2,...,n,  and  expand  £(9.)  about  j)  , 

f(8>  *  f (6)  +  V<e  -  2.)  +  2  (S  -  B)TW(£  -  0)  (4) 


-3- 


where  J£  Is  an  n  *  p  *  p  array  with  i-th  face  W^ ,  i  ”  1,2,...,n.  Multiplication 
involving  three  dimensional  arrays  is  defined  as  in  Bates  and  Watts  (1980)  so  that  the 
third  term  of  (4)  is  an  n-vector  with  elements  (9  -  8)Tw  (8  -  8)/2  ,  i  «  1,2,.,.,n. 

mm  mm  mm^  mm  mm 

The  expected  difference  between  the  linear  and  quadratic  approximations  of  £(9,)  is  thus 

-  ~  E(8  -  0,)TW(9  -  8)  a  d  .  (5) 

These  results  indicate  that  the  bias  will  be  small  if  the  elements  of  £  are 
sufficiently  close  to  zero,  so  that  the  model  is  essentially  linear,  or  if  d  is 
orthogonal  to  the  tangent  plane,  i .e .  the  column  space  of  V  . 

Because  b  can  be  simply  interpreted  as  the  coefficient  vector  from  an  ordinary 
least  squares  regression,  we  can  now  employ  a  variety  of  the  diagnostic  methods  available 
in  linear  regression  to  a  study  of  the  bias  in  nonlinear  regression.  In  particular, 
added  variable  plots  (Coo k  and  Weisberg,  1982)  for  the  components  of  b  ,  or  b  obtained 
by  substituting  estimates  for  parameters,  may  prove  to  be  particularly  useful,  further, 
the  form  of  Jj,  allows  for  the  rather  straightforward  development  of  methods  for 
investigating  the  effects  of  individual  cases  on  the  determination  of  b  or  b. 

To  Investigate  the  influence  of  individual  cases  on  b  ,  some  additional  notation  is 


required.  The  subscript  (i)  means  "with  the  i-th  case  deleted"  so  that,  for  example, 
is  the  (n-1)  x  p  matrix  formed  by  deleting  the  i-th  row  of  v  and 

&U)  "  (2uAi),',J?i)  2<i)  •  (6) 

To  display  the  effects  of  deleting  the  i-th  case,  we  express  b^j  as  a  function  of  the 
full  data,  as  la  usual  in  this  kind  of  investigation.  The  general  methods  for  doing  this 


are  the  same  as  those  used  in  linear  regression,  but  an  added  complication  arises  here 


since  all  the  elements  of  £  change  when  a  case  is  deleted. 


Define  the  n-vector 


where  is  the  i-th  row  of  V  and  is  the  i-th  diagonal  element  of  the  tangent 

plane  hat  matrix  P^  »  V(VTV)  'v1  .  Further,  let  ^  and  d*  denote  (n  -  1)  *  1  vectors 
formed  by  deleting  the  i-th  components  from  5^  and  d  ,  respectively.  Then 
d  -  d1  +  and  it  follows  that 


fed)  ‘  fe"  ** 


T  -1 

(V  V)  viai 
1  -  h . 


(8) 


where  ie  the  coefficient  vector  from  the  ordinary  least  squares  regression  of 

<5^  on  £,  and  is  the  i-th  residual  from  the  regression  of  d  +  <5^  on  V  , 

Equation  (8!  has  several  general  features  in  common  with  well  known,  analogous 
expressions  from  linear  regression.  For  example,  the  tangent  plane  leverage  components 
can  be  interpreted  in  much  the  same  way  as  the  leverage  components  from  linear 
regression  so  that  remote  points  on  the  tangent  plane  can  have  a  substantial  influence  on 
the  bias.  The  vector  reflects  the  change  in  the  expected  difference  between  linear 

and  quadratic  approximations  of  f/®,)  when  the  1-th  case  is  deleted.  If  h^  is  large 
then  individual  components  of  <5^  may  be  large  and  removal  of  the  i-th  case  may  result  in 
a  substantial  change  in  the  agreement  between  the  linear  and  quadratic  approximations. 


3.  RES I 00 AL  BIAS 

Although  the  errors  in  model  (1)  have  expectation  zero,  the  expectations  of  the 
residuals  e.  «  y.  -  fix.,  9)  ,  i  ■  1,2, ...,n,  are  generally  nonzero  and  in  this  sense 

1  l  *" 

the  residuals  are  biased. 

Taking  the  expectation  on  both  sides  of  (4),  using  fe  to  approximate  E(9)  -  9,,  and 
*  *  1*  2  T  — 1 

using  E(9  -  9)  (9  -  J9)  *  a  (V  V)  yields  the  following  approximation  E  for  the 

expectation  of  the  residual  vector  e  ”  (e^), 

E  -  (I  -  P  )d  . 

~  ~  ~v  ~ 

This  approximation  agrees  with  the  results  of  Cox  and  Snell  (1968). 


(9) 


A  useful  pattern  appears  from  (3)  and  (9):  Prom  (3)  the  parameter  bias  £  is  simply 
the  coefficient  vector  from  the  ordinary  least  squares  regression  of  jJ  on  %  #  while  from 
(9)  we  see  that  E(£)  is  just  the  vector  of  residuals  from  this  same  regression.  Other 
useful  analogies  with  linear  regression  can  be  developed  as  well:  Suppose  that  f(jx,  £) 

is  to  be  used  as  an  estimate  of  f  »  f(jj,  9,)  at  a  point  £  that  does  not  occur  in  the 

2  2  *  T 

data.  Let  v  ■  3f  /39  and  W  *  3  f  /39  .  Then  f(x,  9)  -  Ef(x,  9)  =  d  -vb  where 

*X  X  *•  ""'X  X  "•  **  **  X  "'JT' 

d  "  -  <J^tr[(VTV)  ’w  ]/2  is  the  "response"  at  x  and  vTb  is  the  "estimate"  at  x 

X  *  ~X  **  "X"' 

T 

from  the  regression  of  d  on  V  .  In  such  problems,  it  may  be  useful  to  plot  -  v^b 
as  a  function  of  £  to  find  if  there  are  regions  of  substantial  bias. 

Bates  and  Watts  (1980)  express  b  in  terms  of  the  diagonal  elements  of  the  faces  of 
the  parameter-effects  curvature  array  £  .  Similar ily,  £  can  be  expressed  in  terms  of 
an  orthonormal  basis  for  the  null  space  of  V  and  the  diagonal  elements  the  intrinsic 
curvature  array,  .  Let  V  ■  OR  denote  the  OR  decomposition  of  JJ  and  partition 

ft  ”  [ft,  N]  where  the  columns  of  the  n  *  (n  -  p)  matrix  JJ  form  an  orthonormal  basis 
for  the  null  space  of  V  .  Then 


E 


i-1 


~i  i 


(10) 


where  £  is  an  (n  -  p)  *  1  vector  with  elements  a^j,  j  “  1,2,...,n-p,  and  a^^ 

N 

is  the  i-th  diagonal  element  of  the  j-th  face  of  the  intrinsic  curvature  array  £  .  (In 
forming  (10)  we  have  used  the  original  rather  than  scaled  data  as  in  Bates  and  Watts, 
1980).  since  £  depends  only  on  JJ,  and  the  elements  of  the  intrinsic  curvature  array, 
it  is  invariant  under  transformations  of  the  parameters,  as  expected. 

The  influence  of  the  i-th  case  on  the  determination  of  £  can  be  found  by  using  the 
results  on  b(  t j  from  Section  2.  Let  ej(1)  denote  the  j-th  residual  based  on  the  data 
without  the  i-th  case,  j  «  1,2,.. .,n,  and  let  d^  and  5^  denote  the  j-th  elements  of 
jJ  and  ,  respectively.  Then  "  ^j  +  ”  Xjft(i)  an<*  u®ln9  [8)  It  follows 

that  the  change  in  the  residual  expectation  is 


-6- 


(11) 


E(*j(i)  ■  v  *  *u  -  '  hij  v(i  -  hi> 

where  h^j  is  the  ij-th  element  of  J[J  .  When  i  =  j  equation  (11)  gives  the  change  in 
bias  of  *(£<«  a)  when  the  i-th  case  is  deleted, 


eK(d  “ 


*ii  ~ 

1  -  h . 


T^X  Etei> 


(12) 


Clearly,  remote  points  on  the  tangent  plane  can  have  a  substantial  influence  on  the 

T 

average  behavior  of  the  residuals.  Notice  also  that  4ii  -  is  simply  the  i-th 

residual  from  the  regression  of  <L  on  V  . 


4.  SPECIAL  MODELS 


4.1  Partially  Nonlinear  Models. 

Further  insight  into  the  behavior  of  the  bias  can  be  gained  by  considering  the 
partially  nonlinear  response  function 


£< 2.)  -  ZA  *  (*a(Y>  (13) 

T  T 

where  J  is  a  known,  full  rank  n  x  (p  -  2)  matrix,  II  "  (a  ,B,Y)  and,  as  indicated, 

0  and  Y  are  scalars.  This  class  of  response  functions  occurs  often  in  practice  and  in 

the  statistical  literature;  see,  for  example.  Bates  and  Watts  (1980)  Gallant  (1975), 

V^lund  (1978)  and  Stone  (1980).  In  particular,  (13)  allows  for  transformation  of  a  single 

explanatory  variable  in  linear  regression.  Let  3,  S*  and  jj"  denote  n-vectors  with 

2  2 

elements  g^,  3g^/3y  and  3  g^/3Y  ,  respectively.  Application  of  the  results  of 

Section  2  yields 


i  -  !£.  3<Y>,  8a*(Y)l 
d  *  -  cov(B,  Y)3'(Y)  “  ^  0var(Y)a"(Y> 


(14) 

(15) 


and 

b  -  -  0-1cov(0,  Y)  t  -  4  0var(Y)(VTV)‘1VTa"(Y)  (16) 

a  mm 

where  l  is  the  p-th  standard  basis  vector,  and  var(Y)  and  cov(0,Y)  are  the  indicated 
~P 

2  T  - 1 

large  sample  variance  and  covariance,  i.e.  the  appropriate  elements  of  0  (V  V) 


-7- 


T  *1  T 

Notice  that  V>  £  S"  simP^y  the  coefficient  vector  from  the  regression  of 

*  *  A 

jj"  on  **>8  that  cov(B,Y)  contributes  only  to  the  bias  of  Y  . 

The  behavior  of  |>  ai  a  function  of  Y  is  complicated  and  difficult  to  predict. 

For  the  remaining  parameters,  however,  several  useful  conclusions  can  be  obtained  from 

(16).  First,  ];,  does  not  depend  on  a,  as  expected.  Second,  as  a  function  of 

2  * 

B  and  o  ,  the  bias  of  each  of  the  components  of  £  and  the  bias  of  8  are 

2  *  2  2 
proportional  to  a  /8  since  var(Y)  is  proportional  to  o  /B  •  Third,  as  a  function 

2  *  2  2  ‘  A 
of  B  and  o  ,  the  bias  of  Y  is  proportional  to  d  /$  since  cov(8,Y)  is 

2 

proportional  to  (J  /8  .  Clearly,  values  of  B  close  to  zero  can  result  in  substantially 
biased  estimators  while  for  sufficiently  large  values  of  B  the  bias  will  be 
negligible.  This  reinforces  the  usual  empirical  notion  that  a  transformation  Y  an 
explanatory  variable  in  linear  regression  will  be  well  determined  only  when  the  r  ession 
coefficient  6  is  large  relative  to  a. 

The  second-order  expectation  of  £  for  model  (13)  follows  immediately  from  (  , 

E  »  -  4  B  var(Y)(l  -  P  )g"(Y>  .  (17) 

2  2 

As  a  function  of  B  and  o  ,  J  lg  proportional  to  d  /B  . 

W 

There  is  an  Interesting  relationship  between  the  maximum  intrinsic  curvature  T 
(Bates  and  Watts,  1980)  and  the  residual  expectation  (17).  The  maximum  intrinsic 
curvature  can  be  written  as 


MI  -  P  )  f.  I 
~  —v  ~k 


where  f^  and  ^  are  the  acceleration  and  velocity  vectors,  respectively,  and  the  maximum 

is  taken  over  all  directions  k  »  (k,)  tvP  .  we  have  added  the  "hat"  to  P  as  a 

—  i  ~v 

reminder  that  all  derivatives  involved  in  <1S>  are  evaluated  at  the  maximum  likelihood 
estimates.  Evaluating  l  for  model  (13)  we  find  that 


*  ibi  '<i  -  p  >a"<Y)i 


•kTvW 


*  |81  var(Y)Mi  -  j^ls"  (Y  )  •  "'p/s 


From  (17)  and  (19)  it  follows  that 


lEl/s  =  rN/2/p  (20) 

where  E  is  E  evaluated  at  the  maximum  likelihood  estimates.  Thus,  for  model  (13)  the 
maximum  intrinsic  curvature  also  describes  the  length  of  the  vector  of  estimated  residual 
biases.  This  simple  relation  does  not  seem  to  hold  for  more  complicated  models. 

4.2  Power  Transformations  of  Explanatory  Variables. 

In  this  section  we  investigate  the  special  case  of  model  (13)  that  corresponds  to 
selecting  a  power  transformation  for  a  single  explanatory  variable  x  in  linear 
regression.  Emphasis  is  placed  on  obtaining  an  understanding  of  the  role  that  the  values 
of  x  have  in  determining  the  parameter  and  residual  biases. 

The  problem  of  determining  a  power  transformation  for  a  single  explanatory  variable 
can  be  approached  in  the  framework  of  model  (13)  with 

g^Y)  -  g(x.,Y)  =  (x]  -  1)/Y  .  (21) 

We  investigate  the  role  of  the  x^'s  by  modifying  (21)  to  allow  for  a  systematic 
alteration  of  these  values.  Specifically,  for  y  *  0  we  use  glcx^y)  ,  c  >  0  ,  while 
for  Y  *  0  we  find  it  more  convenient  to  use  g(x^,Y)  •  The  introduction  of  the  known 
constant  c  is  intended  to  allow  for  a  comparison  of  a  limited  number  of  designs  and 
should  not  be  confused  with  rescaling  of  a  common  design. 

When  Y  “  0  and  g^  =  g(x^,Y)  we  find  V  =  [Z,  clog  x^,  8c^(*og  x^)2/2]  and 
g"(x^,0)  •  c3(log  x^)3/3  .  Next,  let  b1  -  (bd  and  E1  denote  the  parameter  and 
residual  biases,  respectively,  when  0  =  B  =  c  =  1  .  Then  a  little  algebra  will  verify 
that 


bi  ”  02b^/Bc  ,  i  =  1,2,...,p-2  , 

2  1  2 
b  =  0  b  /Be 

p- 1  p- 1 

and 


(22) 


where  the  biases  b 


b  -  o2b1/S2c3 
P  P 

J_1  and  bp  correspond  to  B  and  Y 

2  1 

E  *  a  E  /Be  . 


respectively.  Further, 


(23) 
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The  results  in  (22)  and  (23)  suggest  that  when  Y  *  0  increasing  the  ratio  max  x^/min  x^ 
will  reduce  all  biases,  with  Y  benefitting  from  the  greatest  reduction.  For  example, 
comparing  the  designs  fx^}  and  (x^}  ,  we  see  that  the  bias  for  Y  from  {x^)  will  be 
8  times  larger  than  the  bias  from  (x2)  . 

When  Y  *  0  and  g^  *  g ^ ( cx  ^  Y ) ,  J  does  not  depend  on  c.  For  example  the  designs 
f x . )  and  (2x^)  will  yield  the  same  residual  expections  when  y  »  0  . 

When  Y  *  0  we  have  been  unable  to  find  an  informative  expression  for  £  when  using 

g(cx  ,Y>.  However,  when  j  contains  a  column  of  I's  it  is  relatively  straightforward  to 

deal  with  E  since  the  column  space  of  y  is  the  same  as  the  column  space  of 

V  »  [2,  x.,  x^log  x^l  ,  and  except  for  the  term  c  x^llog  x^>  /y  all  addends 

comprising  g"(Y)  -  (g"(cxi,Yl)  are  contained  in  the  column  space  of  V  .  Thus,  when 

*  v  v  2 

constructing  the  projection  in  (17)  we  nay  use  V  and  g"(cx^,Y)  *  c  x^(log  x^)  A  • 

From  this  it  follows  that  when  g^  *  g^(cx^,Y) 

E  -  02E1/6cY  .  (24) 

Clearly,  the  effect  of  replacing  {x^}  with  {cx^}  depends  strongly  on  y  and  c.  For 
example,  when  y  >  0  the  Intrinsic  curvature  for  the  design  {2x^}  will  be  smaller  than 
that  for  {x^}  ,  while  when  y  <  0  the  reverse  is  true. 

5.  ILLUSTRATIONS 

In  this  section,  we  present  several  examples  to  illustrate  selected  results  of  the 
previous  sections. 

To  illustrate  the  nature  of  the  residual  bias  (9),  we  use  model  (13)  with  3<Y) 
given  by  (21),  i  •  n  =  20  and  x^  =  2,2(1)21.2  .  Plots  of  E  versus  x  for 

various  values  of  y  are  shown  in  Figure  1.  For  reference,  we  also  give  the  maximum 
Intrinsic  curvature  for  0  =  B  =  1  and  each  value  of  Y  •  The  values  of  and  the 

scales  of  the  y-axes  can  be  converted  for  other  values  of  a  and  B  ,  and  certain  other 
values  for  the  explanatory  variable  by  using  (20),  (23)  and  (24).  The  plots  of  £  are 
clearly  patterned,  but  the  display  changes  in  both  shape  and  magnitude  as  Y  varies.  At 
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N 

Y  =  -  1,  r  and  the  magnitudes  of  the  residual  expectations  are  relatively  large,  while 
at  Y  "  2  the  residual  expectations  seem  unim.  tant. 

Box  and  Hill  (1974)  describe  a  weighted  analysis  based  on  a  linearized  version  of  the 

model 


ya<«12  - 

1  1  +  Vil  +  92Xi2  +  93X13 


(25) 


for  i  *  1,2,..., 24.  We  use  this  model  and  the  data  provided  by  Box  and  Hill  (1974)  to 
illustrate  the  effects  of  individual  cases  on  the  parameter  bias.  The  ordinary  least 
squares  estimate  of  9^  -  (9q,...,9j)  is  6^*  —  (35.9,  .071,  .038,  .167)  ,  and  the  large 
sample  estimates  of  the  corresponding  standard  errors  are  8.21,  .179,  .100  and  .416, 
respectively.  The  estimated  bias  vector,  obtained  by  substituting  estimates  for 
parameters  in  (3),  is  bT  =  (1.997,  .438,  .245,  1.02)  .  Several  of  these  biases  are  large 
relative  to  the  parameter  estimates,  even  when  the  standard  errors  are  taken  into  account. 

An  index  plot  of  the  03  component  of  jj  -  b  is  given  as  Figure  2.  Clearly, 
cases  20  and  22  play  a  substantial  role  in  reducing  the  bias  of  ®3  •  Comparing  Figure  2 
with  Figure  3,  an  index  plot  of  the  h^  ,  we  see  that  the  three  cases  (20,  22  and  24) 

A  A 

with  the  largest  effects  on  the  bias  of  #3  also  have  the  three  largest  values  of 
(see  equation  8).  Removal  of  any  of  these  three  cases,  particularly  case  20,  would  cause 
a  substantial  change  in  the  agreement  between  the  linear  and  quadratic  approximations  of 
(25).  This  can  be  seen,  for  example,  by  inspecting  the  elements  of  ,  which  are 

substantial,  or  by  comparing  the  curvatures  with  and  without  case  20:  »  .J88  , 


rT  -  i40.ii,  r"20) 


.  12  and  r  (2Q)  »  352.77  . 


To  illustrate  the  use  of  added  variable  plots  associated  with  the  regression  in  (3), 
we  use  the  Michael is-Menton  model  fi  -  6 3 x/ ( 8 2  +  x)  in  combination  with  the  data  from 
Bates  and  Watts  (1980).  However,  for  emphasis  the  first  case  (x  «  2,  y  “  .0615)  is 
deleted  in  this  example  so  that  there  are  n  “  11  cases.  The  estimates  based  on  these 

A  A 

data  are  ^  =■  .08898  ,  82  “  1.3668  and  s  “  .00425  .  The  large  sample  standard  errors 

A  A 

for  91  and  02  are  .0165  and  .4079,  respectively.  Figure  4  gives  the  added  variable 
plot  for  8^  .  The  first  five  points  in  this  plot  have  two  replicates,  while  the  final 
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point  in  the  upper,  right  corner  is  replicated  only  once.  The  clear  indication  from  this 


figure  is  that  the  outlying  point  is  having  a  substantial  influence  on  the  bias  for  8,  • 
Further  calculation  confirms  this  indication:  For  the  full  (11  cases)  data 

bT  ■  (.0026,  .074)  while  the  bias  without  the  outlying  case  (x  “  2,  y  ■  .0527)  and 

*T 

evaluated  at  the  estimates  from  the  full  data  is  b  =  (.0432,  .898)  .  Also,  without  the 
outlying  case  91  «  .0990  and  -  1.5693  ,  and  the  respective  large  sample  standard 
errors  are  .0808  and  1.656.  The  substantial  increase  in  the  standard  error  of  8^ 
partially  accounts  for  the  change  in  the  bias  of  8^  when  the  outlying  case  is  removed. 

DISCUSSION 

The  development  of  diagnostic  methods  for  linear  regression  is  dependent  on  a 
thorough  study  and  characterization  of  the  exact  small  sample  behavior  of  a  few 
fundamental  building  blocks  such  as  the  ordinary  residuals  and  related  statistics.  In 
nonlinear  regression,  the  small  sample  behavior  of  the  corresponding  building  blocka  is 
generally  intractable  so  that  some  degree  of  approximation  is  necessary.  We  have  found 
the  various  quadratic  approximations  in  this  paper  to  be  useful  aids  for  understanding 
selected  aspects  of  nonlinear  regression  problems.  In  principle,  these  approximations  can 
be  extended  to  a  higher  accuracy,  although  the  practical  usefulness  of  such  extensions  is 
unclear. 

Recall  that  the  residual  expectations  are  strongly  dependent  on  the  intrinsic 
array  ,  The  maximum  intrinsic  curvatures  calculated  by  Bates  and  Watts  (1980,  Table 
2),  and  Ratkowsky  (1983)  indicate  that  the  residual  expectations  will  generally  be 
negligible.  This  may  be  a  reflection  of  the  quality  of  published  studies  rather  than  a 
reflection  of  the  intrinsic  linearity  of  statistical  models  since  the  illustrations  in 
Section  5  show  that  there  is  no  intrinsic  reason  why  the  intrinsic  curvature  cannot  be 
large. 

Finally,  the  results  of  this  paper  are  presented  from  a  diagnostic  view,  but  may  also 
be  useful  in  other  contexts.  The  updating  analog  of  (8),  for  example,  may  be  useful  for 
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searching  the  factor  space  to  find  a  few  additional  runs  that  would  substantially  reduce 
the  bias. 


APPENDIX 


Equivalence  of  Bias  Approximations 

To  show  the  equivalence  of  the  five  bias  approximations  discussed  in  Section  2,  we 
develop  Cox  and  Snell's  (1968)  result  for  model  (1).  The  approximations  of  Box  (1971), 
Clarke  (1980),  Hougaard  (1981)  and  Amari  (1982)  will  be  obtained  at  intermediate  steps  in 
this  development. 

T  -1 

Let  rar8«  r,s  “  (  denote  the  elements  of  M  “  (V  V)  and  iet  denote 

the  log  likelihood  corresponding  to  the  i-th  observation  from  model  (1)  so  that  the  total 

log  likelihood  is  (  *  J  l.  .  Here  and  in  the  expressions  that  follow,  summations 
i 

involving  the  index  i  are  over  the  integers  1  to  n,  while  summations  involving  any 

2 

other  index  are  over  the  integers  1  to  p.  We  assume  that  a  is  known  when  constructing 


t. 

Cox  and  Snell's  (1968)  bias  approximation  for  8g«  s  -  1,...,p 


as 


£  m 

r  t  u 


mm  (K  _  + 

rs  tu  rtu 


2J  ) 
t,ru 


can  now  be  written 


(A.  1 ) 


where  X 


rtu 


E3Jt/38  38  30 
rtu 


and  J 


t  ,ru 


E(3t,/38  ) (3  t  /38  38  ) 
it  i  r  u 


The  approximation  error  for  (A.1)  is  o(n“*).  Evaluating  Krtu  and  Jt  ru  for  model 
( 1 ) ,  we  find  that 


rtu 


-  I_  )  (fV 

2  f  'Vi 


tu  ♦  fVu  ♦  fufrt) 

i  i  i  l 


and 


j  -  —  )  fVu 
t ,ru  2  f-  Vi  • 


Substituting  these  forma  in  (A.1)  and  simplifying  yields 
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(A. 2) 


b  -  -  2_  l  l  ra  m 
•  2  “  “  ri  tu  1  i 

1  r,t,u 

Apart  from  trivial  changes  in  notation,  this  is  Nougaard's  (1981)  bias  approximation. 


Next,  since 


equation  (A. 2)  can  be  rewritten  as 


1  f,  ■  tr(M  jr  ) 
L  tu  i  ~  ~i 


t,u 


b  m  -  —  5  }  m  f.tr(M 
s  2  ‘  ‘  nl  *'i 

i  r 


or  equivalently 

c2  , 

b  -  -  5-  H  l  J^trfM  V^)  (A. 3) 

T  ' 

where  is  the  i-th  row  of  fa  .  Apart  from  notation,  equation  (A. 3)  is  the  bias 

approximation  derived  by  Box  (1971). 

To  obtain  Clarke's  (1980)  expression  for  the  bias,  we  first  express  (A. 3)  in  full 
matrix  form 

q» 

(A. 4) 


b  -  M  VTd 


0* 

where  fa  is  an  n-vector  with  elements  “  jj  tr(M  W^)  .  Next  we  express  (A. 4)  in  terms  of 

the  QR-de composition  V  -  fafa  ,  where  £  is  a  p  *  p  nonsingular,  upper  triangular 

T 

matrix  and  fa  is  n  x  p  with  orthogonal  columns,  fa  fa  •  Is 

b  -  L  UTd  (A. 5) 

where  L  »  r'1  .  The  i-th  element  of  fa  can  be  represented  as 


di  -  -  T**ikT  Si  y  *  i 

where  d^j  is  the  j-th  diagonal  element  of 
C2  T 

-  y-  fa  JJ^  fa  •  Finally,  defining  fa^  to  be  an  n-vector  with  elements 
d^,  i  “  1,2,  ...n,  we  obtain 

b  -  L  l  uV 

«w  ^  mr( 

J  ^ 

which  is  Clarke's  (1980)  form.  In  Clarke's  notation,  fa  ■  fa  and  Bijj 

is  the  i-th  column  of  fa  .  Further,  in  the  notation  of  Bates  and  Watts 
2  T 

a, .  /s  p  •  0  d,  so  that,  apart  from  differences  in  notation,  Clarke's 

j  ~ 


(A.  6) 

T 

"  where  u^ 

(1980), 

form  is  the  same 
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as  that  given  by  Bates  and  Watts . 

Amari  (1982,  Theorem  9)  gives  the  bias  of  an  efficient  estimator  from  a  curved 
exponential  family.  For  the  maximum  likelihood  estimator  £  »  Amari's  form  immediately 
reduces  to 

m 

where  r^ur  is  the  mixture  connection.  From  the  definition  of  the  mixture  connection,  we 
have 

B  a2i  at  i  _|1£_  at_  at  i 

ae  *  ae  f  E'aet  ae  ae  1  * 

t  u  r  t  u  r 

For  model  ( 1 )  the  second  term  of  the  mixture  connection  equala  zero  and  it  is  not 

difficult  to  verify  that  the  first  term  is  \  f^u  fT/o^  •  Substituting  this  into  (A. 7) 

1 

yields  (A. 2)  so  that  for  model  (1)  Amari's  bias  approximation  is  equivalent  to  the  four 
forms  discussed  above. 


tur 


e| 


36 


-J-  I 


r,t,u 


m 

* 

rs  tu  tur 


(A. 7) 


-15- 


Klchaeli s-Men ton  model j  R.  »  residuals  from 
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biases  and  on  understanding  how  the  residual  biases  can  affect  the  usefulness  of 
standard  diagnostic  methods.  It  is  shown  that  the  various  bias  expressions  in 
the  literature  are  equivalent,  that  the  biases  in  nonlinear  regression  can  be 
studied  usefully  in  the  context  of  linear  regression,  and  that  diagnostic  plots 
using  residuals. can  be  misleading  .because  of  substantial  residual  biases.  .For 
a  class  or  .partially  nonlinear  niodels*  it, Is  shown  that  the  maximum  intrinsic 
curvature  (Bates  ana  Watts  1980)  is  closely  related  to  the  residual  expectations. 
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Finally,  the  model  associated  with  power  transformations  of  single  explanatory 
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